mm/zsmalloc: per-cpu deferred free to accelerate swap entry release#810
Open
blktests-ci[bot] wants to merge 4 commits intolinus-master_basefrom
Open
mm/zsmalloc: per-cpu deferred free to accelerate swap entry release#810blktests-ci[bot] wants to merge 4 commits intolinus-master_basefrom
blktests-ci[bot] wants to merge 4 commits intolinus-master_basefrom
Conversation
Author
|
Upstream branch: 6d35786 |
1f0d33a to
b1870f6
Compare
Add a per-cpu deferred free mechanism to zsmalloc with a callback
interface that lets callers (zram, zswap) customize push and drain
behavior.
Each CPU owns a single-page buffer. The hot path (zs_free_deferred)
writes a value into the current CPU's buffer via the push callback
with preemption disabled — no locks, no atomics. When the buffer
fills, it is swapped with a fresh page from a pre-allocated page
pool and the full page is queued to a WQ_UNBOUND worker for drain.
The drain worker invokes the drain callback which performs the actual
expensive work (zs_free, slot_free, etc.) in batch, away from the
original hot path.
Page pool management:
- Pool is pre-allocated at enable time (ZS_DEFERRED_POOL_SIZE pages)
- Full buffers are drained and returned to the pool
- If no free page is available when buffer is full, the push falls
back to synchronous processing by the caller
Signed-off-by: Wenchao Hao <haowenchao@xiaomi.com>
Register zswap_deferred_ops to defer the entire zswap_entry_free() to the WQ_UNBOUND worker. The invalidate hot path only stores the entry pointer into the per-cpu buffer (512 entries/page). The drain callback performs the full entry teardown: lru_del, zs_free, memcg uncharge, cache_free, and stats update. On deferred failure, fallback to synchronous zswap_entry_free(). Signed-off-by: Wenchao Hao <haowenchao@xiaomi.com>
Register zram_deferred_ops with zs_pool_enable_deferred_free() to defer slot freeing to a WQ_UNBOUND worker. The notify hot path only stores a u32 slot index into the per-cpu buffer (1024 entries/page). The drain callback does slot_lock + slot_free + slot_unlock for each index. On deferred failure (no free page), fallback to synchronous slot_lock + slot_free + slot_unlock. Signed-off-by: Barry Song <baohua@kernel.org> Signed-off-by: Wenchao Hao <haowenchao@xiaomi.com>
Author
|
Upstream branch: aa54b1d |
Replace four separate flag clear operations in slot_free() with a single mask write. This reduces redundant read-modify-write cycles on the same flags word. Signed-off-by: Wenchao Hao <haowenchao@xiaomi.com>
d9623ad to
66ef037
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull request for series with
subject: mm/zsmalloc: per-cpu deferred free to accelerate swap entry release
version: 3
url: https://patchwork.kernel.org/project/linux-block/list/?series=1091432